perf: Rework ExecuteReaderAsync to minimize allocations #528

Wraith2 · 2020-04-16T17:11:47Z

Benchmarking against the usual DataAccessPerformance project I've got all the way down to the once per call allocations now. ExecuteDataReader uses callbacks delegates and closures which can be cached eliminated or minimized in many cases.

The AAsyncCallContext class is very similar to the one already used internally in SqlDataReader because the pattern of allocation and caching is similar. Once this PR is reviewed and merged I intend to go back and rebase the SqlDataReader internal classes on this new base class.

The changes here come in a number of patterns:

replace an instance bound delegate which must be allocated on each call with a static cached caller which has a new first parameter of the this type and bounces the call through a translation to an instance call again.
move continuations into state functions and pass state as parameters wherever possible to allow caching and avoid closures.
replace a compiler generated closures (which are always in method scope) with an explicitly and limited scope state object.

I also changed a void returning function which was altering a ref parameter to one which returns the new value because it's easier to understand what it's doing using the more familiar pattern.

some of the allocations eliminated are:

there are some more off the bottom of the screenshot.

The same work can be done for ExecuteScalar eventually.

DavoudEshtehari

Can I ask you to describe how do you get this report and what do you mean of the DataAccessPerformance project?

src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/AAsyncCallContext.cs

src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/ProviderBase/DbConnectionFactory.cs

Wraith2 · 2020-06-24T07:43:07Z

The DataAccessPerformance project emulates the TechEmpower fortunes benchmark and I was pointed to it as a good performance based benchmarking solution. If you look at my history of pull requests here and in corefx you'll see I use it a lot and have been gradually improving performance and memory usage using it.

The screenshot is from dotMemory which is a memory profiler. I highlighted the items that I've affected in red using paint

roji · 2020-06-25T00:54:34Z

src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/ProviderBase/DbConnectionFactory.cs

+
+        private Task<DbConnectionInternal> CreateReplaceConnectionContinuation(Task<DbConnectionInternal> task, DbConnection owningConnection, TaskCompletionSource<DbConnectionInternal> retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionPoolGroup poolGroup, CancellationTokenSource cancellationTokenSource)
+        {
+            return task.ContinueWith(


Small comment @Wraith2: ContinueWith generally performs much worse than a simple async method with await (not to mention that's it's more brittle wrt exception handling etc.). I can post some benchmarking if you want.

Though this may be totally justified if TaskContinuationOptions.LongRunning is important here - just making the general remark.

Language supported async await would be easier to understand as well.
Whoever wrote the original code had some reason to use imperative constructs instead of language support but I've no idea what it was.

Language supported async await would be easier to understand as well.

Absolutely.

Whoever wrote the original code had some reason to use imperative constructs instead of language support but I've no idea what it was.

I suspect a lot of the code was simply written before async/await was introduced (in .NET 4.5, relatively "late"). I'd definitely consider replacing the callback approach as you work through the code.

Fine with me but it's down to the MS team to make that sort of decision. I don't want to make working on this library any harder than it already is so I try to replicate the existing style wherever it's sensible to do so in my changes.

There are several library wide things that would be nice to do.

Merge the netfx and netcore codebases into a single solution with documented reasons for any differences

Change how async methods are implemented with snapshots to reduce performance issues.

Change from imperative async to language async.

We've got 1 in progress but the others really need 1 to be done and judged stable so we don't have to duplicate a lot of work. 1 will aoso bring perf improvements done in netcore to netfx.

I vote for prioritizing 1 - it is currently a quite confusing codebase, which makes community contributions harder than the should be.

working on it bit by bit. #625 The easy part is the identical files. The complex part is the files which have opposing or complex changes.

Sure thing, thing kind of cleanup can be done separately of course. I'd suggest that if ContinueWith continuations are being rewritten for unrelated reasons, it may be a good opportunity to just do that with async/await, but of course that's up to the team and you.

I also totally agree that merging the netfx/netcore codebases is a high-priority task.

roji · 2020-06-25T00:57:48Z

src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/SqlConnection.cs

@@ -1864,14 +1873,22 @@ private static void ChangePassword(string connectionString, SqlConnectionString
        // this only happens once per connection
        // SxS: using named file mapping APIs

-        internal void RegisterForConnectionCloseNotification<T>(ref Task<T> outerTask, object value, int tag)
+        internal Task<T> RegisterForConnectionCloseNotification<T>(Task<T> outerTask, object value, int tag)


This could be a candidate for rewriting with simple async/await

@David-Engel @cheenamalhotra I can give this a try in the PR if you like or we can leave it for a later date, what do you think?

Are there more similar cases that we would like to address?

The original code tooks a ref Task and added a continuation to it then returned the new wrapper task in the same variable. It doesn't do any awaiting and if the caller was rewritten to use language async then this function would be totally rewritten or removed as well. It'd prefer not to touch it further in this PR and work on it at a stage when a larger scale transition from contiunation to langauge async is being done.

DavoudEshtehari · 2020-06-25T18:51:38Z

Can I ask you to produce some information to illustrate how do these changes affect the performance?

Wraith2 · 2020-07-11T17:23:58Z

Using Northwind and the benchmark code:

		[Benchmark]
		public async Task<int> AsyncReadOrders()
		{
			int max = 0;

			using (SqlDataReader reader = await command.ExecuteReaderAsync())
			{
				while (await reader.ReadAsync())
				{
					int value = reader.GetInt32(0);
					if (value > max)
					{
						max = value;
					}
				}
			}
			
			return max;
		}

Method	Mean	Error	StdDev	Gen 0	Gen 1	Gen 2	Allocated
AsyncReadOrders branch	353.9 μs	6.93 μs	9.71 μs	-	-	-	2.15 KB
AsyncReadOrders master	349.1 μs	6.92 μs	8.24 μs	-	-	-	2.59 KB

It's a really tiny benchmark as you can see from the times. It simply gets some ints using a data reader and reads them. The main push is the allocated memory. Very little of those allocations are user data they're task machinery and library internals. The reduction you see is delegates which are removed by caching static versions of them and some async closures which are removed by using direct state objects to avoid capture.

johnnypham · 2020-07-13T15:57:10Z

I was able to verify the difference in allocations but am I correct in thinking that the benefits would mostly be lost when returning larger result sets for each call to ExecuteReaderAsync?

src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/SqlCommand.cs

Wraith2 · 2020-07-13T16:29:42Z

The benefits wouldn't be lost they'd just be less of the total memory allocated if you do a lot of reading. If people are making an effort to do sensible things, chunky not chatty, then this is just a nice to have, if they're being bad at sql and doing chatty calls then this is nicer to have. It's never fundamental.

In an ideal situation I'd like the library to only allocate memory for the results that a user requested, that isn't practical but if there is benefit to users for relatively small increase in complexity of the library it's possibly worth taking. The reason I started working on this library is that I had written some code using the new pipelines and span apis in netcore and had a really lean fast project and then when I started trying to get that data in and out of a database using this library obliterated all the careful performance work I'd done. So yes these are smaller gains but they're still of value if you want to support high performance scenarios.

cheenamalhotra · 2020-07-16T04:04:20Z

@Wraith2 seems there are conflicts with recent merge of PR #499 - please resolve when you have time.

Wraith2 · 2020-07-16T08:05:18Z

merged and fixed up.

dotnet#378 and dotnet#528 to netfx).

Wraith2 force-pushed the perf-registerclosestate branch from f697060 to 5c66f1b Compare April 16, 2020 21:12

Wraith2 mentioned this pull request Apr 28, 2020

Update SslOverTdsStream #541

Merged

cheenamalhotra requested review from JRahnama, DavoudEshtehari, karinazhou and johnnypham June 23, 2020 14:31

DavoudEshtehari reviewed Jun 24, 2020

View reviewed changes

roji reviewed Jun 25, 2020

View reviewed changes

cheenamalhotra added the 📈 Performance Use this label for performance improvement activities label Jun 25, 2020

johnnypham self-assigned this Jul 8, 2020

Wraith2 added 2 commits July 11, 2020 01:38

rework ExecuteReaderAsync to minimize allocations

1a64710

address feedback

9f512cf

address feedback

6150b3b

Wraith2 force-pushed the perf-registerclosestate branch from 6ce2966 to 6150b3b Compare July 11, 2020 17:36

johnnypham reviewed Jul 13, 2020

View reviewed changes

src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/SqlCommand.cs Show resolved Hide resolved

DavoudEshtehari approved these changes Jul 14, 2020

View reviewed changes

johnnypham approved these changes Jul 15, 2020

View reviewed changes

cheenamalhotra added this to the 2.1.0-preview1 milestone Jul 16, 2020

Wraith2 added 2 commits July 16, 2020 08:53

fix merge conflicts

3a4d75c

fixup merge mistake

98d43e5

cheenamalhotra merged commit 7b82e07 into dotnet:master Jul 21, 2020

Wraith2 deleted the perf-registerclosestate branch August 7, 2020 21:30

cheenamalhotra mentioned this pull request Aug 19, 2021

Perf | Port PR #328 to improve performance in .NET Framework #1084

Merged

panoskj added a commit to panoskj/SqlClient that referenced this pull request Aug 26, 2022

Merging of TdsParserStateObject.ReadAsyncCallback (note: ports parts of

765c7dd

dotnet#378 and dotnet#528 to netfx).

panoskj added a commit to panoskj/SqlClient that referenced this pull request Nov 21, 2022

Merging of TdsParserStateObject.ReadAsyncCallback (note: ports parts of

f5d3e1c

dotnet#378 and dotnet#528 to netfx).

panoskj added a commit to panoskj/SqlClient that referenced this pull request Nov 21, 2022

Merging of TdsParserStateObject.ReadAsyncCallback (note: ports parts of

6190851

dotnet#378 and dotnet#528 to netfx).

panoskj added a commit to panoskj/SqlClient that referenced this pull request Nov 21, 2022

Merging of TdsParserStateObject.ReadAsyncCallback (note: ports parts of

57f27ac

dotnet#378 and dotnet#528 to netfx).

panoskj added a commit to panoskj/SqlClient that referenced this pull request Nov 22, 2022

Merging of TdsParserStateObject.ReadAsyncCallback (note: ports parts of

81f549e

dotnet#378 and dotnet#528 to netfx).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Rework ExecuteReaderAsync to minimize allocations #528

perf: Rework ExecuteReaderAsync to minimize allocations #528

Wraith2 commented Apr 16, 2020

DavoudEshtehari left a comment

Wraith2 commented Jun 24, 2020

roji Jun 25, 2020

roji Jun 25, 2020

Wraith2 Jun 25, 2020

roji Jun 25, 2020

Wraith2 Jun 25, 2020

ErikEJ Jun 25, 2020

Wraith2 Jun 25, 2020

roji Jun 25, 2020

roji Jun 25, 2020

Wraith2 Jun 25, 2020

cheenamalhotra Jun 25, 2020

Wraith2 Jul 11, 2020

DavoudEshtehari commented Jun 25, 2020

Wraith2 commented Jul 11, 2020

johnnypham commented Jul 13, 2020

Wraith2 commented Jul 13, 2020

cheenamalhotra commented Jul 16, 2020

Wraith2 commented Jul 16, 2020

perf: Rework ExecuteReaderAsync to minimize allocations #528

perf: Rework ExecuteReaderAsync to minimize allocations #528

Conversation

Wraith2 commented Apr 16, 2020

DavoudEshtehari left a comment

Choose a reason for hiding this comment

Wraith2 commented Jun 24, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DavoudEshtehari commented Jun 25, 2020

Wraith2 commented Jul 11, 2020

johnnypham commented Jul 13, 2020

Wraith2 commented Jul 13, 2020

cheenamalhotra commented Jul 16, 2020

Wraith2 commented Jul 16, 2020